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Abstract —Systematic polar codes are shown to outperform 
non-systematic polar codes in terms of the hit-error-rate (BER) 
performance. However theoretically the mechanism behind the 
better performance of systematic polar codes is not yet clear. 
In this paper, we set the theoretical framework to analyze the 
performance of systematic polar codes. The exact evaluation of 
the BER of systematic polar codes conditioned on the BER of 
non-systematlc polar codes Involves in 2^^ terms where N Is the 
code block length and R is the code rate, resulting in a prohibitive 
number of computations for large block lengths. By analyzing 
the polar code construction and the successive-cancellation (SC) 
decoding process, we use a statistical model to quantify the 
advantage of systematic polar codes over non-systematic polar 
codes, so called the systematic gain In this paper. A composite 
model is proposed to approximate the dominant error cases in 
the SC decoding process. This composite model divides the errors 
into independent regions and coupled regions, controlled by a 
coupling coefficient. Based on this model, the systematic gain can 
be conveniently calculated. Numerical simulations are provided 
in the paper showing very close approximations of the proposed 
model in quantifying the systematic gain. 

Index Terms —Polar Codes, Systematic Polar Codes, Polar 
Codes Encoding, Successive Cancellation Decoding, Systematic 
Polar Gain 


I. Introduction 

Polar codes are systematically introduced by Arikan in HI. 
It’s shown there that polar codes can achieve the capacity 
for symmetric binary-input discrete memoryless channels (B- 
DMC) with a low complexity. The encoding and decoding 
process (with successive cancellation, SC) can be implemented 
with a complexity of 0{N \ogN). The polarization of N 
channels is realized through two stages; channel combining 
and splitting. Channels are polarized after these two stages 
in the sense that bits transmitting in these channels either 
experience almost noiseless channels or almost completely 
noisy channels for a large N. The idea of polar codes is to 
transmit information bits on those noiseless channels while 
fix the information bits on those completely noisy channels. 
The fixed bits are made known to both the transmitter and 
receiver. The binary input alphabet in Arikan seminal work 
III is later on extended to non-binary input alphabet 121-141. 
The construction of polar codes have then been investigated 
and different procedures are proposed Q-ia assuming the 
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original 2x2 kernel matrix. Polar codes based on the kernel 
matrices of size I x I are studied in E). Polar codes have also 
been extended to different scenarios since then Qol-lll. 

The rate of polarization of polar codes is studied in IT], 1131 
without including the effect of the code rate. In works Ci¬ 
lia, the authors analyzed the polarization rate considering 
the effect of both the block length and the code rate. The 
asymptotic behavior of polar codes reported in these works 
does not guarantee a good performance in practice when a 
finite block length is applied. In fact, the performance of 
polar codes with the SC decoding and finite block lengths 
are not satisfactory El E). Different decoding techniques 
are deployed to improve the performance of polar codes El- 
1241. The authors of El-El use belief propagation (BP) in 
the decoding process in place of the SC decoding. The list 
decoding procedure of l22l and l23l involves multiple paths 
instead of a single path as in the SC decoding process. The 
concatenation of polar codes with LDPC codes are proposed 
in II 20 I and 1241 to further improve the performance of polar 
codes. These techniques focus on the improvement in the 
decoding algorithms while keeping the original coding process 
as in m. The price paid in these improvements is the extra 
decoding complexity. 

Another direction to improve the performance of polar codes 
is also introduced by Arikan in l25l by using systematic polar 
codes. If we denote u as a vector containing source bits and x 
as the corresponding codeword obtained by using the normal 
polar codes construction. Note that in this paper we use non- 
systematic polar codes and normal polar codes interchangeably 
without further notice. The basic idea of systematic polar 
codes is to use some part of the codeword x to transmit 
information bits instead of directly using u to transmit them. 
The advantage of systematic polar codes is the low decoding 
complexity: Systematic polar codes require only part of the 
encoding process (involving only Os and Is) after the normal 
SC decoding is done. This low complexity can be seen from 
the way x is estimated: x = uG where u is the estimation 
of u from the normal SC decoding, and G is the generator 
matrix. In the rest of the paper, we call this indirect, two-step 
(SC decoding then encoding) decoding process of systematic 
polar codes the SC-EN decoding. 

In II 25 I . it’s shown that systematic polar codes achieve better 
bit-error-rate (BER) performance than normal polar codes. 
However, Arikan also noted in that it’s not clear why 
systematic polar codes achieve better BER performance than 
non-systematic polar codes even with an indirect decoding 
procedure (the SC-EN decoding); first decoding u then re¬ 
encoding X as uG. One would expect that any error in u 
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would be amplified from this re-encoding process x = uG. 
However simulation results in ll25l as well as simulation results 
in this paper show that with this two-step decoding procedure, 
systematic polar codes still achieve better BER performance 
than non-systematic polar codes. 

This paper studies the error performance of systematic 
polar codes with special focus on characterizing the advantage 
of systematic polar codes over non-systematic polar codes. 
We start by simplifying the general encoding process of the 
systematic polar codes. This is done through proving a theorem 
on the structure of the generator matrix. Then we discuss 
the theoretical BER performance of systematic polar codes 
conditioned on the BER performance of non-systematic polar 
codes. The general form of this error prediction involves in 
2 NR {gj-jjjs which is prohibitive to compute for large block 
lengths N. It’s then proven that for two special cases we 
can theoretically predict the error rate of systematic polar 
codes. To understand the general better behavior of systematic 
polar codes, we further study the basic error patterns of non- 
systematic polar codes with the SC decoding. A systematic 
gain is defined to describe the advantage of systematic polar 
codes over non-systematic polar codes. A composite model 
is proposed to approximate the mean effect (or the dominant 
effect) of the error events. This composite model uses the fact 
that the errors in the SC decoding process are coupled. A 
coupling coefficient is used to control the level of coupling 
between the errors. This model facilitates the calculation of 
the systematic gain and can be used to predict the performance 
of systems utilizing systematic polar codes. 

Eollowing the notations in m, in the paper, we use to 
represent a row vector with elements {vi,V 2 , We also 

use V to represent the same vector for notational convenience. 
Given a vector , the vector vf is a subvector {vi,...,Vj) 
with 1 < i,j < N. If there is a set ,4, G {1,2,..., A^}, then 
VA denotes a subvector with elements in {vi,i G A}. 

The rest of the paper is organized as follows. In Section 
m the background of systematic polar codes is introduced 
and a theorem on the structure of systematic polar codes is 
proven. The first part of Sec. |III]provides a general theoretical 
formation of the BER performance of systematic polar codes 
given the BER performance of non-systematic polar codes. 
Two special cases are analyzed in this part whose BER 
performance can be characterized. Section |IV] studies the basic 
error patterns and the first error distribution of non-systematic 
polar codes, followed by the introduction of the systematic 
gain. In Section |V] we propose a coupling model which is 
used to predict the BER performance of systematic polar 
codes. Simulation results are given in Section IVll Concluding 
remarks are presented in Section IVIII 

II. Systematic Polar Codes 

Eor completeness, in the first part of this section, we restate 
the relevant materials on the construction of normal polar 
codes and systematic polar codes from HI ||25]| . In the second 
part of this section a theorem on the structure of the normal 
polar codes is provided which is used to simplify the encoding 
of the systematic polar codes. 


A. Preliminaries of Non-Systematic Polar Codes 

Let W be any binary discrete memoryless channel (B-DMC) 
with a transition probability W{y\x). The input alphabet X 
takes values in {0,1} and the output alphabet is y. Channel 
polarization is carried in two phases: channel combining and 
splitting. Eventually, N = 2"(n > 1) independent copies 
of W are first combined and then split into N bit channels 
This polarization process has a recursive tree 
structure in jT], which we plot here for the ease of reference. 
The Os and Is in Eig. [T] refer to the bit channels W and 
W" respectively in the basic one-step channel transformation 
defined as (W, W) i—5- (W , W ), where 

W {yi,y2\ui) = \w{yi\ui ®U2)W{y2\u2) 

U2 

( 1 ) 

W (yi,y 2 ,uilu 2 ) = {yi\ui ® U 2 )W{y 2 \u 2 ) (2) 

The Bhattacharyya parameters of channel W and W satisfy 
the following conditions: 


Z{w") = Z{Wf 

(3) 

Z{w') < 2Z(W) - Z{Wf 

(4) 

Z{w') > Z(W) > Z(w'') 

(5) 


The label 0 (the upper branch in the transformation) in 
Fig. □ means that the output channel takes the branch W 
in that specific transformation. Correspondingly, a label 1 (the 
lower branch in the transformation) means the output channel 
takes W in that transformation. Note that for binary erasure 
channels (BEC), the Bhattacharyya parameter Z{W ) has an 
exact expression Z(W ) = 2Z{W) — Z{W)^, resulting in a 
recursive calculation of the Bhattacharyya parameters of the 
final bit channels. Einally, after the channel transformations, 
the transition probability for bit channel i is defined as 

Wj;\y^,u\-^\u.)= Y. ( 6 ) 

where {■) is the underlying vector channel {N copies of 
the channel W) and G is the generator matrix whose form is 
to be discussed in the next section. 

B. Construction of Systematic Polar Codes 

Polar codes in the original format III are not systematic. 
The generator matrix for polar codes is Gp = BF®'^ in H] 
where i? is a permutation matrix and F = [{]*]. The operation 
F®" is the nth Kronecker power of F over the binary field 
F 2 . Eor systematic polar codes, we focus on a generator matrix 
without the permutation matrix B, namely G = F®". With 
such a matrix, the encoding for normal polar codes is done as 
x = uG. 

The indices of the source bits u corresponding to the 
information bits can be set by selecting indices of the bit 
channels with the smallest Bhattacharyya parameters. Denote 
A as the set consisting of indices for the information bits. 
Correspondingly, A consists of indices for the frozen bits. Both 
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Fig. 1. The recursive channel transformation of polar codes. 


sets A and A are in {1,2, ...,7V}. For any element i & A and 
j G A, we have Z(wj^^) < Z(W^^). In this paper, the set 
A is always sorted in ascending order according to the index 
values, instead of ordered by their values of Bhattacharyya 
parameters. 

The source bits u can be split as u = {u_a,u_^). The 
codeword can then be expressed as x = uaG_a + ujGj^, 
where is the submatrix of G with rows specified by the 
set A. The systematic polar code is constructed by specifying 
a set of indices of the codeword x as the indices to convey the 
information bits. Denote this set as B and the complementary 
set as B. The codeword x is thus split as (xg, xg). With some 
manipulations, we have 

(xj3, Xg) = {uaGaB + UaGab^ '^aGaB + ’’^aGab) G) 

The matrix Gab is a submatrix of the generator matrix 
with elements j}ig _4 jgg. Given a non-systematic encoder 
{A, Ua), there is a systematic encoder (6, ua) which performs 
the mapping xg i—x = (xg,xg). To realize this systematic 
mapping, xg needs to be computed for any given information 
bits Xg. To this end, we see from © that Xg can be computed 
if UA is known. The vector ua can be obtained as the 
following 

UA = {xb - uaGab){Gab)~^ ( 8 ) 

From it’s seen that xg i —ua is one-to-one if xg has 
the same elements as ua and if Gab is invertible. In 1251 . 
it’s shown that B = A satisfies all these conditions in order 
to establish the one-to-one mapping xg i— ua- In the rest of 
the paper, the systematic encoding of polar codes adopts this 
selection of 6 to be 6 = A. Therefore we can rewrite © as 

( x ^, xa) = [uaGaa + uaGaa’ uaGaa + uaGaa) ( 9 ) 


C. Theorem on Polar Coding Construction 

In this section, we prove a general theorem on polar codes. 
In the following, we say that row i intersects with column j 
of the matrix G if Gij = 1. Otherwise, we say row i does 
not intersect with column j. 

Theorem 1: For Vj G A and Mi G A, row i does not intersect 
with column j. Or in other words Gij = 0 if j G A and i G A. 

Proof: For any given index j G A, we divide the elements 
of A into two sets: Ai = {i : i G A, i < j} and Ag = {i : 
i G A, i > j}. For i G Ai, it’s obvious that Gij — 0 since 
the matrix G is lower triangular. So we only need to prove 
Gij = 0 for z G .4g. 

Let be the n-bit binary expansion of the 

integer i — 1 with i G Ag, and b\ is the MSB. The bit 
corresponds to the root channel selection in Fig. [T] and 
b\ corresponds to the last channel selection. Each bit in the 
binary vector {b\^,b\^_i, ...,b\) defines a channel selection of 
the corresponding level in the tree of Fig. [T] For example, bit 
b\^ (m G {l,2,...,n}) determines bit channel i at level m 
takes the upper branch or the lower branch. 

Suppose row i intersects with column j G A, equivalent to 
Gij = 1. We know the entry of the generator matrix G can 
be calculated as © 

n 

G^,j= Uil (Bbi.oVM ( 10 ) 

m—1 

To have Gij = 1, we must have bl^ = 1 when bf^ = 1. 
Suppose Mj is the last non-zero position of (&7 , bl^_i ,..., b\) 
and Mi is the last non-zero position of {b\, ^D- With 

i G Ag, Mi > Mj. We proceed by discussing two cases: 
Mi = Mj and Mi > Mj. 
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1) Case 1 Mi = Mj: For Mi = Mj, we have 

Re¬ 
ferring to Fig. [T] it’s seen that the recursive channel trans¬ 
formation from level n to level Mj + 1 (or Mi + 1) is the 
same for both bit channel i and bit channel j: they all take 
the upper branch in each transformation. Then at level Mj, 
both channels involve in the same fashion by taking the lower 
branch (corresponding to b\^. = b^j^, = 1). Divide the levels 
{m : m < Mj} of bit channel j into two sets: 

Mo = {m : m < Mj and = 0} (11) 

Ml = {m : m < Mj and b!^ = 1} (12) 

With b\^ = \ whenever = 1, we can equivalently express 
A^i as 


Ml = {m : m < Mj and = 1} (13) 

Define a set Moi = {m : m € Mq and bl^ — 1}. This set 
Moi is not empty since there must be at least one m' G Mg at 
which = 1 since i > j, Mi = Mj, and h\^ = 1 whenever 
bj^ = 1 for m € Mi. When |Afoi| > we select m! to be 
the largest in Moi- At level m', bit channel i takes the lower 
branch (corresponding to 6^, = 1) and bit channel j takes the 
upper branch (corresponding to 6^, = 0). Therefore starting 
from level m', the Bhattacharyya parameter for bit channel i 
and bit channel j diverge according to Q and (|5]i: 


Z{W^n'2*2) = 



(14) 


mf:: 

’)) > zinfM 

(15) 

where 




N / — 

2n—m' 


(16) 

k / — 

(bn J bn—1 , . 


(17) 

h = 

(bl^AXi,- 

^m' + li 1) 

(18) 

II 

(bi,bi_i,- 

■■,bX+iX) 

(19) 

The number ki and kj is the channel index for bit channel i 


and bit channel j at level m', respectively. Starting from the 
same previous channel , it’s obvious that bit channel 

i has a smaller Bhattacharyya parameter than bit channel j at 
level m'. For levels m < m', this advantage of bit channel i 
continues until the last level because of the recursive channel 
transformation process defined by the set Mq and Mi in (fTTIi 
and (O. Therefore if Dij = 1 , when Mi = Mj, we have 
Z{W^m) < for j GAandiG A. 

2) Case 2 Mi > Mj: In this case, we define a set Mu = 
{m : m > Mj and b\^ = 1}. This set Mn obviously is not 
empty since Mi > Mj. But the set Mqi could be empty in this 
case. If Moi — 0, the recursive channel transformation for bit 
channel i and j is the same for levels {m < Mj}: they take 
the upper branches at levels in Mg and take lower branches 
for levels in Adi. However, their involving processes differ in 
at least one level m' G Mu because of the existence of the 
non-empty set Mn. When |Adii| > 1, we select m' to be the 
smallest in Ad^i. Bit channel i takes the lower branch at level 
m' while bit channel j takes the upper branch at the same 
level, resulting in a smaller Bhattacharyya parameter for bit 


channel i at that level. After level m', as we already point out, 
the two channels involving in the same fashion defined by Ado 
and Ad i. Therefore the final Bhattacharyya parameters for bit 
channel i is still smaller than bit channel j. If Adoi ^ 0, the 
advantage of the bit channel i is even more pronounced than 
the case when Adoi = 0 since bit channel i takes additional 
lower branches besides taking the same lower branches as bit 
channel j, producing a final channel with an even smaller 
Bhattacharyya parameter. Therefore as in the case when Mi = 
Mj, we also have for j G A and i G A 

when Mi > Mj. 

Combing Case 1 and Case 2, we see that if Di j = 1, 
Z(wj}^) < Z(wj^^) for J G A and i G A. But this contradicts 
with the polar encoding principle that Z(]Vj}^) > Z(wjf'^) for 
j G A and i G A. Therefore Dij = 0 for j G A and i G A. 

■ 

Corollary 1: The matrix = 0. 

Proof: The statement of = 0 is equivalent to say 
that any column j £ Al of the generator matrix G does not 
intersect with row i G A of G, which we already prove in 
Theorem [T] ■ 

Using Corollary [T] the systematic encoding of polar codes 
can be simplified as 

{XA, xX) = {uaGaa, uaGaa + uaGaa) (20) 

The calculation of ua in (01 can thus be simplified as ua = 

^.aGXa- 

From the proof of Theorem [T] another corollary is readily 
available. 

Corollary 2: For any i,j G A, if row i intersects with 
column j of the generator matrix G (Gij = 1), then 
Z{W^^) < Z{w\^'^). Or in other words, bit channel i has 
a better channel quality than bit channel j when = 1. 

D. Generator Matrix with Permutation 

The original generator matrix in Q is Gp = BF®” where 
B is the bit-reversal permutation matrix. We use the vector 
a to represent the sorted elements in A and the vector b 
the corresponding vector consisting of the indices for the 
systematic encoding B. In 125], it’s pointed out that B is the 
image of A under the matrix B, namely b = aB. If the 
encoding of the normal polar codes is based on Gp, then 
the submatrix Gab in (13 is Gab = {Gp)ab- With some 
manipulations, it can be shown that {Gp)ab = Gaa- Thus, 
for systematic encoding, the generator matrix Gp = BF^'^ 
and b = aB is equivalent to G = F®" and B = A. \n 
the sequel, when it comes to the SC decoding, we assume 
the encoding is based on the generator matrix Gp so that 
the natural order schedule of the decoding can be applied. 
This is only for the ease of description and doesn’t affect the 
performance of systematic polar codes. 

III. The Theoretical Pereormance of Systematic 
Polar Codes 

In this section, we provide a general relationship between 
the error performance of systematic polar codes and the error 
performance of the non-systematic polar codes. 
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Denote the BER of non-systematic polar codes as Ph and 
the corresponding BER of systematic polar codes as Pgysfi- 
Define a set C ^ to contain the indices of the information 
bits in error for non-systematic polar codes. Correspondingly, 
the set Asys,t ^ ,4 is the indices of the information bits in 
error for systematic polar codes under the SC-EN decoding. 
The BER for systematic polar codes can be predicted from the 
BER of the non-systematic polar codes in the following way: 


E |Ays.t|Pr{Ays,t} 

^sys,b ^ |A|Pr{A} ' 

AtQA 


( 21 ) 


where \At\ is to take the cardinality of the set At and Pr(-) 
is the probability of the inside event. 

For any given set At, the set Asys,t can be calculated from 
it. From (|20] |. we already have This says that 

the values or the errors in only depend on u _4 and 
The values of the frozen bits don’t affect the values or the 
errors of xa- Therefore, we can convert the cardinality of the 
set At and Asys,t into weight of the following vectors. Let y_ 
be a A^-element vector with Is in the positions specified by At 
and Os elsewhere, namely Then the cardinality 

of the set At is the same as the Hamming weight of the vector 
V, written as wh{v). In the same way, we define a vector q 
with qAsys t = li ^nd Os elsewhere. We can then have 


P, 


sys,b — 


X; WHiq)Pr{Asys^t} 

Asys,tQA 

E WHiv)PT{At} 
AtQA 

E WH{vG)PT{Asys,t} 

AtCA 


Pb ( 22 ) 


E WH{v)Pr{At} 
AtCA 


Pb 


(23) 


The equality q = vG in equation (|2^ is because of the re¬ 
encoding of i = uG after the decoding of u. Note that the 
operation q = vG only represents the error conversion from u 
to q, not the real calculation of i = uG. 

The cardinality of A is |^| = NR = K where R is the 
code rate and K is the number of information bits in each 
code block. It’s easy to verify that the number of terms in 
the denominator of (|23]l is 2^^ = 2^. With a large block 
length N and a fixed code rate R, it’s practically impossible 
to evaluate the error performance for systematic polar codes 
conditioned on the error performance of the non-systematic 
polar codes. 

In this section, without considering the probabilities of 
the error events {At}, we evaluate the error performance of 
systematic polar codes in two special cases to gain some initial 
insights of the behavior of the systematic polar codes. These 
two special cases are: 1) va = and 2) The eth element 
of V is one: Ve = I with e G A. Case 1) is the situation 
where all bits are in error and case 2) says only one bit is in 
error. The rationale for evaluating case 1) is due to the fact 
that if one bit j G ,4 is in error, then theoretically this error 
bit could affect all bits after it. This can be seen from the 
transition probability of bit channel i > j in (IHi: bit channel 
i has its output (all received channel samples) and 


(all previously decoded bits). As for case 2), it’s related to the 
common assumption of coded systems that errors of the code 
bits in one codeword are independent and that at high SNR, 
there is only one bit in error in each codeword, resulting in 
the relationship Pb = Pg/N, where Pb is the BER and Pg is 
the block error rate. 

Before we analyze case 1, we need the following proposi¬ 
tion. 

Proposition 1: For a block length N = 2", n > 0, any 
column J (1 < J < N) of the generator matrix G = T’®" has 
a Hamming weight of where is 

the binary expansion of j — 1 , and bj = 6^ © 1 over F 2 . 

Proof: For a fixed column j, the weight of this column 
is to sum over all possible values of z — 1 = {b\, b ^,..., &^): 

E. Gt,y = E. ET = S* n:=i(l Hnblnl The rest 

of the proof is readily available. ■ 

A. All Bits in Error 

From Proposition[T] it can be inferred that except column N, 
the weight of all other columns of G is even. From Theorem 
[T] we know column j G ,4 of the generator matrix G only 
has Is at positions specified by A since column j doesn’t 
intersect with rows in A. Therefore, during the re-encoding 
process q — vG, the vector q{A\N} — and q^ = 1. 

Here A\N means the set A excluding the last element N. The 
weight of q is then wniq) = 1- We see almost all the errors 
in the vector y_ are cancelled after the re-encoding process 
(with only one error remaining). If this is the only error case, 
then Pgygfi = j^Pb = 1 ^- We give an example below to 
explicitly present this error cancelling process. 

Suppose we are dealing with a BEC channel with an 
erasure probability 0.4 and N — 16. Let the code rate 
R — 1/2. The code index set can be calculated as ^ = 
{8,10,11,12,13,14,15,16}. With all bits in error during the 
SC decoding process, the vector va = if- The elements of 
qA can be calculated from qA = {vG)a- For example. 


98 

= U8 + uie = 0 

(24) 

9io 

= uio + V 12 + t^i4 + 'Uie = 0 

(25) 

911 

= Vii + V 12 + Ul5 + U 16 = 0 

(26) 


With the weight of the columns of G be even (excluding 
column N) and the columns with indices in A only intersect 
with rows in A, the elements of q (excluding qjsf) are essen¬ 
tially summing over even numbers of elements of va, which 
eventually resulting in Os when va = if- The last element is 
<Zi6 = t'le = which is the only error remaining after the 
vector V going through the matrix G. The error rate is then 
Pb — 1 and Pgyg^b — g- 

From this example, it’s seen that the re-encoding process 
of i = uG after decoding u does not amplify the number of 
errors in u when all bits of ua are in error. Actually in this 
case, the number of errors is already at its maximum and can’t 
be amplified. But the number of errors doesn’t stay the same, 
as one would expect in this case, after the re-encoding process. 
Instead, almost all errors are cancelled after the re-encoding 
process. 
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B. One Bit in Error 

Now we return to the case with only one error = 1, 
e ^ A. Denote the eth row of G as Ge,-- The indices of the cor¬ 
responding error bits for systematic polar codes are the indices 
of the non-zero positions of the subvector (Ge,:)^. Therefore 
the number of non-zero positions of qj( is determined by the 
weight of this subvector (Ge,:)^: wniq) = tu/f{(Ge,:)^}. 
Due to the fact that G is a lower triangular matrix, only 
elements in {i : i G A and i < e} of q_A are affected by this 
error in In this one-error case, the number of errors could 
be amplified after the re-encoding process x = uG, depending 
on the location of the error. The error rate is Pb — and 
Psys b = jvfl preceding example, instead 

of = 1®, if we only have uig = 1, then qj( = if since 
wh{Giq^-)a = 8, resulting in P;, = i and Psysfi = 1- But 
if we only have ug = 1 or vio = 1, then we also only 
have the corresponding bit in error gg = 1 or gio = 1 with 
Pj = Psys,b = The number of errors in the case viq = 1 
is indeed amplified by 8 times after the re-encoding process 
while the number of errors with ug = 1 or uio = 1 stays the 
same after the re-encoding process. 

IV. Systematic Polar Codes Gain 

In the discussions from Section IIII-AI and IIII-BI we already 
see that the number of errors of polar codes with the SC 
decoding is not necessarily amplified in the SC-EN decoding 
process of systematic polar codes. It all depends on how the 
errors are distributed in the SC decoding process. This section 
is devoted to the analysis of the behavior of the errors in 
the SC decoding process and to characterize the advantage of 
systematic polar codes over non-systematic polar codes. The 
analysis is based on BEC channels. In Section lVll it’s seen that 
the results in this Section can be extended to AWGN channels 
as well. 

A. Basic Error Patterns 

In order to understand how the errors are distributed with 
the SC decoding, we first look at the basic error patterns. The 
decoding graph of polar codes with a block length N = 2^ 
consists of n columns of Z-shape sections, with each column 
having N/2 Z-shape sections. Eor the connections of the Z- 
shape sections in each level, please refer to HI ifTTl . In this 
subsection, we use the natural order schedule for the SC 
decoding as discussed in Section Hl-DI 

The basic error patterns in the decoding graph are illustrated 
in Eig. 121 where a node without any label has a correct 
likelihood ratio (LR) value, a node with a label 1 has a LR 
value of one, a node with a label X has an incorrect LR value, 
and a node with a label ? can have a correct or incorrect LR 
value depending on the context. In the SC decoding, before the 
first error happens, the LR values of the variable nodes in the 
Z-shape sections are either correct or 1, represented by (a),(b) 
and (c) of Eig. |2l We provide the proof of the error pattern 
Eig. He in the Appendix and all other patterns in Eig. |2] can 
be proved in the same fashion. 

The LR value of the first error bit must be one. Again, 
the proof of this fact is omitted as this is relatively a simple 


practice. In other words, the first error happens because the 
decoder takes an incorrect guess, corresponding to the upper 
left node in Eig. |2l(a)(b)(c) and the lower left node in Eig. |2l 
(c). 

After the first error, as we already point out, all bits after 
this error bit could potentially be affected by this error. Eor 
example, the lower left nodes in Eig. |21-(d)(f)(g) are in error 
because of the previous errors. These errors are surely the 
errors propagated (or coupled) from previous errors. But not 
all bits after the first error bit are in error, simply through 
observing the basic error patterns in Eig. |2] The first example 
is Eig. |2l-(a). If the bit (or the combined bit) corresponding to 
the upper left node is in error, then the LR value corresponding 
to the lower left bit is still correct. Actually, the LR of the 
lower left node is not affected by the upper left node since 
the upper right node in Eig. |2]-(a) has a LR value of one. 
In this case, as long as the lower right node has a correct 
LR value, the lower left node can always make a correct 
decision. Another example is Eig. |2}(e) in which the upper 
left node has an incorrect LR value thus with an incorrect bit 
decision. But the incorrect bit decision cancels the effect of 
the incorrect LR value of the upper right node when it comes 
to the decision of the lower left node. Therefore the lower left 
node can make a correct decision in this case even though the 
upper left node has an incorrect decision. Eor a rigorous proof 
of this pattern, please refer to the Appendix. There are other 
cases, for example Eig. |2l-(g)(h), where incorrect LRs due to 
incorrect previously decoded bits don’t necessarily cause all 
bits in error after those error bits. Because of these effects, it’s 
extremely unlikely that after the first error bit, all bits after it 
are in error, especially with large block lengths. Eor the same 
reason, it’s also unlikely that all bits after the first error bits 
are correct. In other words, one bit error, like all bits in error, 
is also unlikely. 

Erom the basic error patterns in Eig. |2] one proposition can 
be easily obtained for BEC channels. 

Proposition 2: Eor polar codes with the SC decoding on 
BEC channels, the number of nodes with LR = 1 stays the 
same in each column of the decoding graph. 

B. First Error Distribution 

As stated in the previous section, the first error happens 
because the decoder takes an incorrect guess. All calculations 
before the first error involve patterns in Eig. |2]-(a)(b)(c). Note 
that the question marker in the lower left node should be 
removed before the first error as there are no errors yet. Of 
course, there is always a pattern involving two correct nodes 
which is not shown in Eig. 121 

The probability of bit i being the first error is determined 
by the quality of bit channel i, which in turn is determined by 
its Bhattacharyya parameter. Eor a rigorous proof, please refer 
to Section V-B of HI. In this section, we present simulation 
results on the first error distribution without further theoretical 
discussions. 

Eor BEC channels, we can precisely calculate the Bhat¬ 
tacharyya parameter for each bit channel using the recursive 
expressions given in HI. Fig. [5] shows the histogram of the 









7 



Fig. 2. Basic Error Patterns. A variable node without a label means its LR value is correct. The meanings of the labels are: X referring to an incorrect LR; 
1 meaning a LR value one; and label ? referring to a LR value which could be correct or incorrect. 


indices of the first error bit and the corresponding average 
Bhattacharyya parameter for N = and i? = 1/2 in a BEC 
channel with an erasure probability 0.4. Fig.j^has two y-axes: 
the right axis shows the number of occurrences of the first 
error and the left axis shows the value of the corresponding 
Bhattacharyya parameters. Seen from Fig. |3 the probability 
of the first error is indeed determined by the quality of each 
bit channel. Also shown in Fig. [3] is the brick-wall nature 
of the first error distribution, which is the reflection of the 
polarization effect of the N channels. 

At this point, we want to point out the effect of the first 
error in non-systematic and systematic polar codes scenarios. 
The first error in the SC decoding process could potentially 
affect all bits after it (or bits with indices larger than it with 
the natural order decoding). This effect can be considered as a 
forward error effect. But in the re-encoding process of the SC- 
EN decoding of systematic polar codes, the errors (including 
the first error) in the decoded vector u only affect bits in 
X before them (or bits with indices smaller than the error 
bits), due to the lower triangularity of the generator matrix. 
Correspondingly, this effect in the re-encoding process is a 
backward error effect. 


C. Systematic Polar Codes Gain 

In this section, we extract the first part of the right hand side 
of (I 2 TI 1 and define the reverse of it as the gain of systematic 
polar codes over non-systematic polar codes; 

E |A|Pr{A} 

_ AtQA _ 

E Pr{.4sj/s,t} 

Asj/s,t 


From the previous discussion in Section IIII-AI and IIII-BI we 
can safely constrain the systematic gain to be strict for large 

< T < nr. 

The analysis in Section Hill does not include the effect of the 
coupling between errors as discussed in Section |IV] From the 
discussions in Section IIV-AI we know the errors in previous 
decoded bits could affect the bits after them, although not all 
bits after the error bits are necessarily in error. With a large 
N, there are 2^ — 1 error combinations in the received vector 
Hi . Therefore, the errors of the decoded bits after the first 
decoded error can be considered as independent and identically 
distributed (i.i.d) with probability p when N is large. Based 
on this assumption, we can convert the calculation of the 
systematic gain in (IZTT i into the analysis of a function involving 
the first error distribution and the probability p. 

Denote pi as the probability of the first error occurring to 
the information bit i and denote this error event as As in 
SectionHn] we use a vector to represent the error positions: 
Vi = 1 if bit i is in error and Vi = 0 otherwise. Then the 
probability of all information bits in error conditioned on ^i 
is: 

Mva = If = (0,0,..., l,p,p, ...,p) (28) 

In (|28 T i. the first (i — 1) probabilities are zeros because the first 
error at bit i doesn’t affect bits before it (the forward error 
effect). After bit i, the errors are i.i.d with probability p as 
discussed previously. Since the events are exclusive, 

the probability of the information bits in error is simply the 
following summation 

K 

Pr{z;^ = lf} = ^PrW = lf|^J 


(27) 


(29) 
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Fig. 3. First en'or histogram and the con'esponding average Bhattacharyya Parameter. The code block length is N = 2^^ and the code rate is = 1/2. The 
underlying channel is the BEC channel with an erasure probability 0.4. The right y-axis is for the bar plot and the left y-axis is for the stem plot. The labels 
of the x-axis are the indices of elements in sorted A, not the real values of elements in A. 


with the individual bit error as Prjui = 1} = pi Pj- 

Utilizing the brick-wall property of the first error distribu¬ 
tion as shown in Fig. |2 we can divide the bits in A into 
two groups: group one consisting of the error bits due to the 
bad bit channel conditions and group two consisting of the 
error bits purely coupled from group one. Denote these two 
groups as Ai and Ac respectively. Referring to Fig. [3 the 
set Ai includes the bits within the brick wall and the set Ac 
includes the bits outside the brick wall. Denote Kj — 1-4/1. 
The probabilities can now be expressed as: 


Pr{ui = 1} = < 


Pi +P^Pj, I <i < Ki 
i=i 

Ki 

p'^Pj^ Ki<i<K 
. f=i 

And the systematic gain can be calculated using (|30] | as 


(30) 


EW«)} 

E{a;//«G)} 


(31) 


The evaluation of (ISTT i involves the distribution of the first 
error probabilities of the information bits and the 

probability p. The distribution of the probabilities can 

be approximated by the distribution of the corresponding Bhat¬ 
tacharyya parameters But the combined effect of 

the probability p and the distribution of is not 

intended to be fully discussed in this paper due to the space 
limit. Instead, in Section IVl we establish a simplified statistic 
model to characterize the probabilities in (l30] l and this model 
is used to calculate the systematic gain 7 in (EB. 


D. A Qualitative View of the Systematic Gain 

Using Corollary |2] we can qualify why the systematic gain 7 
should be generally larger than one. Or at least, the systematic 
polar codes should perform as well as the non-systematic polar 
codes. In the re-encoding process, the estimation x = vG is 
performed. The entry of Xj\, say Xj (j € ,4), is 

(32) 


where Gaj is the jth column of G with entries specified by A. 
The error correction capability of the systematic polar codes 
comes from this re-encoding process in (l3^ . To understand 
this capability of systematic polar codes, we first note that the 
weight of all the columns of the matrix G is even except 
the last column. This property of G is already stated in 
the beginning of Section IIII-AI This is where the theoretical 
maximum j = NR comes from. 

From (I 32 I) . it’s seen that the errors in can only affect Xj 
at positions where Gaj have non-zero entries. From Corollary 
m it’s known that a non-zero entry of column j, Gij = 1 , 
means a better bit channel i than j. Let’s call the set of bits 
{i ■. i j, i & A and G^j = 1} the compatible bits of the 
information bit j. For bit Xj, only bit j and its compatible bits 
affect the decision. Since the compatible bits of bit j transmit 
at better bit channels than j, it’s more likely that bit j is in 
error and the compatible bits are in error due to the error 
propagation of bit j. In other words, the errors of bit j and 
its compatible bits are coupled. The re-encoding process Xj = 
uaGaj is equivalent to sum over bit j and its compatible bits, 
a process to average out the coupled errors. This mechanism 
of the re-encoding process leads to the fact that systematic 
polar codes perform at least as well as non-systematic polar 
codes, or 7 > 1 . 

V. Composite Error Model 

So far we are still short of an efficient way to calculate 
the systematic gain 7. In this section, we establish a statistic 
model to simplify the probabilities of the errors in (l30l l. This 
simplified model is then used to calculate the systematic gain 

7- 

We define a new set S as the ensemble of the error events 

At: 

5= U A (33) 

^tC.A 

Considering the basic error patterns in Fig. |2] the errors 
could happen to any bit after the first error bit, no matter 
which bit channel the bit experiences. Therefore, the set S 
can almost surely consist of all the information bits after 


Xj — uaGaj 
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the first information bit with a non-negligible Bhattacharyya 
parameter. For this, we set a threshold a, below which the 
Bhattacharyya parameter is considered as negligible. Other¬ 
wise, the Bhattacharyya parameter is considered as large. 
Define a set consisting of all the indices of the bit channels 
with non-negligible Bhattacharyya parameters as 

Z={i, i€A and Z(wjj^) > a} (34) 

As stated in Section lTl-BI the set A is sorted in ascending order 
according to the index values. So is the set I. This set I is 
used to define the boundaries of the brick wall in Fig. [3 The 
first element (with the smallest index value) in I is denoted 
as Ii. Then S can be written as 

S = {a : a € A and a > Ii} (35) 

When Ii happens to be also the first element of A, then S = 
A. The elements of S are also sorted according to the index 
values as the elements in the set A. 

The next part to define S is to assign each element in S 
a probability of being in error. Following the discussions in 
Section HV-Cl we perform the following steps to S: 

• Divide the set S into two sections: the first section, 
denoted as <Si, being the region where the first error could 
happen, and the second section, S 2 , being the region 
where errors are coupled or induced from region one. 

« The composite effect of yjAt in the first region iSi is 
denoted by the error probability of the first equation of 
(I30l). 

■ The composite effect of UAt in the second region S 2 is 
denoted by the error probability of the second equation 
of®. 

From the second equation of® it’s known that all bits in 
S 2 have the same probability of error from an composite point 
of view and this probability of error should be larger than the 
probability of error in iSi due to the following observation. 
In region one, any bit with index ai could be in error at 
one error event Ai C A with the first error bit ei < ai, 
but will be for sure correctly decoded in another error event 
A 2 ^ A when oi < 62 with 62 being the index of the first 
error bit in event A 2 - In region two, any bit can be potentially 
decoded incorrectly in any error event. So statistically, the 
bits in region S 2 have a higher probability of being in error 
when considering the composite effects of UAt ■ This condition 
translates to the way we select the probability p in (l30l) . 
However, as we point out in Section IIV-CI it’s theoretically 
difficult to precisely calculate the probability of error for each 
element in S. With the above observation, we propose the 
following simplified model in place of the precise model: 

51 = {a : a G <S, a < J™, 

Pr{a is in error} = po} (36) 

5 2 = {a : a e S, a> Im, 

Pr{a is in error} = 1} (37) 

where Im is the last element in I. What this model says is the 
following: The mean effect of UAt is that for information bits 
with indices in 5i, their errors are statistically independent 


with a probability po- The rest of the information bits are 
in error with probability one from a composite point of 
view. Although in this paper a precise probability po is not 
pursued, empirically we find that po = 1/2 is a very good 
approximation. 

An important parameter of the model in (l36l l (lJ7l) is the 
boundary Im- It’s clear that this boundary element Im is 
related to the channel W. For example, with a BEC channel, 
when the block length N and the code rate R is fixed, the 
boundary Im is related to the erasure probability. With a large 
erasure probability, there are more bits which are in error due 
to the channel itself and less bits in error due to the forward 
error effect, and vice versa. Without going into the details of 
calculating the Bhattacharyya parameters of the bit channels 
(which is only possible for BEC channels), we can use a 
coupling coefficient to calculate another boundary element 
Im G S. The coupling coefficient here means the fraction 
of incorrect information bits due to the previously incorrectly 
decoded information bits. Denote the coupling coefficient as 
/3 and the element Im is the m'th element of S where 

m' = Ll^l * (1 - P)\ (38) 

Then we can use Im to replace the boundary element Im in the 
model (l36l l (lT7l i. This boundary based on the coupling coeffi¬ 
cient is especially useful for bit channels whose Bhattacharyya 
parameters are not readily available. 

Note that this simple model in (l36l l (lJ7l l can approximate the 
composite effect UAt only in the statistical sense and it only 
models the dominant effect (or the mean effect) of UAt- It is 
not, by any means, an exact error event At C A. 


A. Calculation of the Systematic Gain 

With the composite error model in (l36l)(IJ7l l. we can calcu¬ 
late the systematic gain. Use the same A^-element vector v as 
an error indicator vector of S: the ith entry of v is zero if 
i i S', otherwise Vi is one if i G 5 and the ith bit is in error. 

' I 0 I 

The subvector corresponding to region two of S is vs^ = li 
seen from dJTl i. Each element of the sub vector vs^ takes value 
in {0,1} with probability po as shown in (l36l l. The systematic 
gain from the composite model is thus 


E{'u;rr(t;)} 


(39) 


The mean weight of v can be easily calculated as Ejwrr (u)} = 
Po|<Si| -I- |<S2|. Now we need to calculate the mean weight 
of xs = vGss, which can be decomposed as {xsi,xs 2 ) = 
v{GsSi,Gss 2 )- Due to the lower triangularity of the matrix 
G 55 , the weight of XS 2 can be directly calculated as 


ujh{xs2} = ^h{vGss2} = WffjusjGsjSj} = 1 (40) 

which uses the even weight property of the columns of G 
except the last column. The first part xsi = vGsSi can be 
further divided into the summation of two parts: 


a;5i =^51^5151 (41) 

The second part in (|4TI) is a deterministic vector since is 
the all-one vector. With GsiSi an invertible lower triangular 
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matrix, the vector xsi belongs to the row space of the matrix 
GsiSi- Thus it can be formed by another vector vsi in the 
identity basis of the row space of Gsj^Si 


XSi = VSjI 


(42) 


with defined in the same way as vs^ ■ Therefore the mean 
weight of xsi is the same as the mean weight of which 
is 


E{a;jr{a:5j}=Po|5i| (43) 


The systematic gain is then 


Po|>^l| + |t^2| 

1 +Po|<Si| 


(44) 


When the cardinality of iSi is quite large, the systematic gain 
can be approximated as: 


7 ~ 1 H- 

Po |‘7i| 


(45) 


An immediate conclusion from (|45] | is that the systematic gain 
is greater than one, meaning that systematic polar codes should 
perform better than the corresponding non-systematic polar 
codes. Another interpretation of (l45l l is that the systematic 
gain is only determined by the ratio of cardinalities of the two 
sets <Si and 82 - It does not increase with the increase of the 
block length as one would intuitively expect. This property 
of the systematic polar codes is verified in the simulations in 
Section |VT] 


VI. Numerical Results 

In this sections, numerical examples for both EEC channels 
and AWGN channels are provided to validate the results in 
Sections |IV] and |V] The encoding for EEC channels are done 
through the selection of the bit channels with the smallest 
Ehattacharyya parameters. Eor AWGN channels, we still use 
the same recursive formula in calculating the Ehattacharyya 
parameters for EEC channels in encoding. We emphasize that 
this encoding serves our purpose just as well, as long as it’s 
consistent for both non-systematic polar codes and systematic 
polar codes. 

Eig. in is the result in the EEC channel for N — and 
i? = 1/2. Several curves are shown in Eig. |4] The curve of 
the stared dotted line is the EER of the non-systematic polar 
codes under the SC decoding. The legend for this curve is 
‘SC’. The curve of the dash dotted line with triangles is the 
EER of the systematic polar codes with the SC-EN decoding 
for which the legend is ‘SYSTEMATIC’. The circled solid 
line is the theoretical EER for systematic polar codes from 
the model in (l36l l (lJ7l l. Also shown in Eig.|4]is the EER of the 
non-systematic polar codes with the belief-propagation (EP) 
decoding (the curve of the dashed line with diamonds). 

The theoretical EER for systematic polar codes in Eig. |4] is 
generated using two different coupling coefficients: /3 = 0.3 
when the erasure probability is larger than 0.45 and /3 = 0.5 
when the erasure probability is smaller than 0.45. This choice 
of the coupling coefficient corresponds to the bad channel 
condition and the good channel condition, respectively. The 
probability of independent error in iSi is po = 1/2 and it’s used 


in all of the following theoretical calculations. The threshold 
a in determining the set X in (l34l i is set to be a = 10“^. 
Under this setting, the first element of X is /i = 192 when the 
erasure probability is 0.4, which is also the first element of A. 
Thus the composite set is 5 = tI in this case. The systematic 
gain calculated from the composite set S is quite stable. A 
small number can be used in averaging this systematic gain. 
In Eig. m only ten realizations are used in calculating the 
theoretical systematic gain 7. The simulated EER and the EER 
from the model in (l36T l (IJ7] | match quite well, showing that the 
simple model in (l36T l (IT7l i can approximate the dominant error 
events of UAt and thus can be used to calculate the systematic 
gain. 

Also showing in Eig. HI is the EER for non-systematic 
polar codes with the EP decoding. EP decoding is generally 
better than the SC decoding as shown in iflTl . With a bad 
channel condition, for example, with an erasure probability 
larger than 0.45, EP decoding performs almost the same as 
the SC decoding. Systematic polar codes, however, perform 
two to three times better than both SC and EP decoding 
under the same channel conditions, at a cost almost negligible 
compared to the complexity of the EP decoding. At better 
channel conditions, EP decoding starts to show its advantage. 

We observe the same phenomenon in Pig. |5] as in Pig. |4] 
for N = 10^^ and R = 1I2. The curves in Pig. |5] have the 
same style and labels as Pig. |4] The coupling coefficient is 
set the same as the case = 10 and R — 1/2. Again, the 
simulated systematic gain embedded in the EER of systematic 
polar codes matches that calculated using the composite set S. 

Showing in Pig. |6]is the EER for W = 10 and i? = 1/4 
in the AWGN channel. The composite set is <S = tI. The 
coupling coefficient is set in the following way: for SNR 
smaller than -1.5 dE, /3 = 0.3; for SNR larger than -1.5 
dE, /3 = 0.5. The systematic gain calculated from the model 
in (l36l l (IJ7T l matches that with the simulations, showing that 
the composite model in (l36T l (lJ7l i can also be used for AWGN 
channels. 

Prom Pig. m to Pig. |6] we see that systematic polar codes 
perform consistently better than non-systematic polar codes, 
echoing the results in ll25l . The systematic gain for different 
block lengths is shown in Pig. |7] The underlying channel 
W is a EEC channel with an erasure probability 0.4. The 
gain represented by the circled line (with a legend ’Sys Gain 
Sim’) is simulated. The gain shown by the stared line (with a 
legend ’Sys Gain Theoretical’) is calculated using (l44li . The 
systematic gain calculated using (l44l l is accurate when N is 
large. It’s seen from Pig. [T] that the systematic gain increases 
with the increase of the block lengths but saturates at around 
7 = 3 when TV > 2®. This coincides with the simulation 
results in Pig. |4] ~ |6] The saturating nature of the systematic 
gain can be seen from the composite set S: With a fixed code 
rate R, as the block length N increases, the cardinality of S 
also increases. So the increase in the error-correction capability 
of the systematic polar codes is counteracted by the increase 
in the number of error bits, rendering the systematic gain to 
reach a limit. 




Fig. 4. BER for n = 10, = 1/2 in BEC channel. 



0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5 
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Fig. 5. BER for n = 12, = 1/2 in BEC channel. 
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Fig. 6. BER for n = 10, R= 1/4 in AWGN channel. 
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Fig. 7. Systematic gain 7 for ij = 1/2 in BEC channels with different block lengths. The erasure probability is set the same for all block lengths as 0.4. 


VII. Conclusion 

In this paper, we analyze the error performance of sys¬ 
tematic polar codes with the SC-EN decoding. Through the 
analysis of the generating matrix of polar codes, the encoding 
process of systematic polar codes is simplified. We use a 
parameter, the systematic gain, to characterize the performance 
of systematic polar codes as compared with the non-systematic 
polar codes. From the study of the basic error patterns and the 
first error distribution of the SC decoding, the information bits 
are divided into two regions and the probability of errors in 
each region is provided. To further use the properties of these 
two regions, we propose a composite model to approximate 
the mean effect of the error events in the SC decoding. Using 
this composite model, the systematic gain can be calculated. 
Numerical results are provided and our models are verified in 
the paper. Systematic polar codes are shown to be around 3 
times better than non-systematic polar codes in terms of the 
BER performance with large block lengths. 


Appendix 

Proof of the Error Patterns 


We provide the proof of the error pattern in Fig. |2}e- Let’s 
assume the two bits at the input to the Z-section is ui and 
U 2 - The output is then xi = ui © U 2 and X 2 = U 2 - In this 
pattern, the LR value of xi is incorrect, namely LR(xi) = 
LR{ui © U 2 © 1). In estimating ui, we have 


, ILR{ui ® U2 ® 1 ) * LR{u2) 
~ LR{ui(Bu2®l) + LR{u2) 


(46) 


Compared with the true estimation 


, _ 1 + LR{ui © U 2 ) * LR{u2 ) 
^ LR{ui® U 2 ) + LR{u2) 


(47) 


it’s readily seen that ui = rti © 1. Therefore the LR value of 
the variable node ui is incorrect, as indicated by a X in the 
upper left node in Fig. |2}e. 

After obtaining the estimation of lii, the LR value of bit U 2 
is given by 


LR{u 2 ) = LR{u 2 ) * LR{ui © U 2 © 1)1-2“1 (48) 


Substituting ui = rti © 1 into (l48T l and using the fact that 
LRiui © 1) = LR{ui)~^, we obtain the following 

LR{u2) = LR{u 2) * LR{ui © m2)-i+2(“i®i) (49) 

Again, comparing with the true estimation of U 2 

LR{u 2 ) = LR{u 2 ) * LR{ui © (50) 

we can verify that (|49] | and (l50l l are equivalent, meaning the 
estimation of U 2 is the true estimation, which is the lower left 
node in Fig. |2}e. 
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